Classification with reject option in gene expression data
نویسندگان
چکیده
MOTIVATION The classification methods typically used in bioinformatics classify all examples, even if the classification is ambiguous, for instance, when the example is close to the separating hyperplane in linear classification. For medical applications, it may be better to classify an example only when there is a sufficiently high degree of accuracy, rather than classify all examples with decent accuracy. Moreover, when all examples are classified, the classification rule has no control over the accuracy of the classifier; the algorithm just aims to produce a classifier with the smallest error rate possible. In our approach, we fix the accuracy of the classifier and thereby choose a desired risk of error. RESULTS Our method consists of defining a rejection region in the feature space. This region contains the examples for which classification is ambiguous. These are rejected by the classifier. The accuracy of the classifier becomes a user-defined parameter of the classification rule. The task of the classification rule is to minimize the rejection region with the constraint that the error rate of the classifier be bounded by the chosen target error. This approach is also used in the feature-selection step. The results computed on both synthetic and real data show that classifier accuracy is significantly improved. AVAILABILITY Companion Website. http://gsp.tamu.edu/Publications/rejectoption/
منابع مشابه
Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملClassification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملThe Data Replication Method for the Classification with Reject Option
Classification is one of the most important tasks of machine learning. Although the most well studied model is the two-class problem, in many scenarios there is the opportunity to label critical items for manual revision, instead of trying to automatically classify every item. In this paper we adapt a paradigm initially proposed for the classification of ordinal data to address the classificati...
متن کاملخوشهبندی دادههای بیانژنی توسط عدم تشابه جنگل تصادفی
Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...
متن کاملSTUDY OF HMGA2 GENE INHIBITION WITH SPECIFIC SHRNA AND SIRNA AND INVESTIGATION OF CORRESPONDING EFFECTS ON DOWNSTREAM GENE EXPRESSION IN MDA-MB-231 CANCER CELLS: A BIOINFORMATIC AND EXPERIMENTAL STUDY
Background & Aims: The use of siRNA to silence gene expression is increasingly expanding today. The aim of this study is to bioinformatically and experimentally investigate the inhibition of the HMGA2 gene and its corresponding effects on downstream genes expression rate in MDA-MB-231 cancer cell treated by shRNA and siRNA specific to HMGA2. Materials & Methods: To perform this bioinformatic a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 24 17 شماره
صفحات -
تاریخ انتشار 2008